Kubernetes

What is Kubernetes?

Kubernetes (also known as K8s) is an open-source container orchestration platform framework that automates the deployment, scaling, and management of containerized applications. It provides a scalable and resilient framework for running applications in a distributed environment, abstracting away the complexities of infrastructure management.

running applications in a distributed environment, abstracting away the complexities of infrastructure management.

Kubernetes itself is not a ready-to-go platform - it’s a framework within which users can build a platform. Typically, users need to integrate many products and services to provide functionality. The types of products and services that are added to Kubernetes give an idea of what is involved: storage (AWS, Ceph), deployment tools (Terraform); containers and templating/service registration (Docker, Rancher, Helm, JFrog); logging/metrics (Grafana, Prometheus); CI/CD pipeline development tools (Jenkins, GitLab CI/CD, or Tekton); load balancing (NetScaler).

Why is Kubernetes needed?

Containers, for example as implemented by Docker, are a good way to bundle and run applications. In a production environment, you need to manage the containers that run the applications and ensure that there is no downtime. For example, if a container goes down, another container needs to start to take over its functionality. This is where Kubernetes comes in.

Kubernetes provides a framework to run distributed systems resiliently. It takes care of scaling and failover for your application, provides deployment patterns, and more.

A few of the key capabilities users typically implement via Kubernetes are:

Service discovery and load balancing: Kubernetes can expose a DNS name and balance/distribute network traffic among multiple nodes to keep the deployment stable.
Storage orchestration: With Kubernetes, you can automatically mount any storage system of your choice.
Automated rollouts and rollbacks: You can describe the desired state for your containers using Kubernetes, and it can change the actual state to the desired state at a controlled rate.
Automatic bin packing: You provide Kubernetes with a cluster of nodes that it can use to run containerized tasks. You tell Kubernetes how much CPU and memory (RAM) each container needs. Kubernetes can fit containers onto your nodes to make the best use of your resources.
Self-healing: Kubernetes restarts containers that fail, replaces containers, kills containers that don’t respond to health checks, etc.

What are the key concepts and components of Kubernetes?

There is a large amount of nomenclature and terminology around the fundamental concepts and components in Kubernetes. Some essential terms to understand the architecture and operation of a Kubernetes environment are:

Kubernetes Cluster: It is a set of nodes that run containerized applications. Containerizing applications packages an app with its dependencies and necessary services. Containers are more lightweight and flexible than virtual machines.
Pod: Pods are the smallest deployable units of computing that you can create and manage in Kubernetes. A Pod is a group of one or more containers, with shared storage and network resources, and a specification for how to run the containers. A Pod’s contents are always co-located and co-scheduled.
Node: It may be a virtual or physical machine. The components on a node include the kubelet, a container runtime, and the kube-proxy. Each node is managed by the K8s control plane and contains the services necessary to run Pods.
Node: It may be a virtual or physical machine. The components on a node include the kubelet, a container

runtime, and the kube-proxy. Each node is managed by the K8s control plane and contains the services necessary to run Pods.
Container Image: It is a ready-to-run software package containing everything needed to run an application: the code and any runtime it requires, application and system libraries, and default values for any essential settings.
Container Runtime: It is the software that is responsible for running containers. Kubernetes supports container runtimes such as containerd, CRI-O, etc.
DaemonSet: It ensures that all Nodes run a copy of a Pod. As nodes are added to the cluster, Pods are added to them. As nodes are removed from the cluster, those Pods are garbage collected.
Namespace: It provides a mechanism for isolating groups of resources within a single cluster. Names of resources need to be unique within a namespace.
Control Plane: The control plane is the brain of the Kubernetes cluster. It manages the cluster's overall state, orchestrates scheduling and deployment of pods, and monitors their health. It includes components such as the API server, scheduler, and controller manager.
ReplicaSet: A ReplicaSet ensures that a specified number of pod replicas are running at all times. It helps maintain the desired level of availability and scalability for an application by automatically scaling the number of pods based on defined rules.
Deployment: A deployment is a higher-level abstraction that manages the lifecycle of pods and replica sets. It provides declarative updates for Pods. It allows for rolling updates, rollback capabilities, and scaling of applications.
Service: A service is an abstraction that defines a logical set of pods and a policy to access them. It provides a stable network endpoint for accessing pods and allows for load balancing and service discovery within the cluster.
Ingress: An ingress is an API object that manages external access to services within a cluster. It provides a way to route incoming traffic, apply load balancing rules, and configure SSL termination. This typically uses HTTPS and HTTP protocols to facilitate the routing.

How is Kubernetes deployed?

Whilst Kubernetes can be deployed on-premises and or on a cloud by users, it is becoming increasingly common for users to leverage managed services or enterprise products built on top and around the open-source framework. Many services and supported enterprise products augment the functionality of basic Kubernetes and remove the need for much of the cumbersome integration and maintenance work needed to achieve a functioning platform.

Kubernetes can be deployed on various cloud platforms and via container orchestration solutions, such as Amazon Elastic Kubernetes Service (EKS), Microsoft Azure Kubernetes Service (AKS), and Red Hat OpenShift. These platforms provide managed Kubernetes services, greatly simplifying the deployment and management process.

For example, with Amazon EKS, users can create a Kubernetes cluster via the AWS Management Console, CLI, or API. EKS handles the underlying infrastructure, including control plane setup, scaling, and patching. Users then deploy their applications onto the cluster using Kubernetes manifests or container images stored in an Amazon Elastic Container Registry (ECR).

Azure AKS offers a similar managed Kubernetes service on the Microsoft Azure platform. Users can provision a cluster through the Azure portal, CLI, or API. AKS handles the management of master nodes, scaling, and integration with other Azure services. Applications can be deployed to AKS using Kubernetes manifests or by pulling container images from Azure Container Registry (ACR).

Red Hat OpenShift is an enterprise supported Kubernetes distribution that provides additional features and capabilities on top of Kubernetes. It can be deployed on-premises, on public clouds, or in hybrid environments. OpenShift simplifies the deployment process through the use of its web console, CLI, or APIs. It offers built-in container registry, networking, monitoring, and developer tooling, providing a complete platform for deploying and managing applications.

In each case, the deployment process typically involves setting up the Kubernetes cluster, configuring networking and security, and deploying applications using Kubernetes manifests or container images (often Docker). The managed Kubernetes services take care of the underlying infrastructure, including high availability, scaling, and maintenance (e.g., security patching and versioning upgrades), allowing users to focus on their applications and workflows.

Why is monitoring Kubernetes important?

In today’s technological landscape, Kubernetes plays a very vital role. Currently, Kubernetes is one of the most widely used and best-known DevOps tools and is used extensively by organizations of all sizes ranging from start-ups to large enterprises. Big organizations often have multiple Kubernetes clusters spread across multiple clouds or hybrid clouds with on-premises and with one or more cloud providers.

Although Kubernetes solves many existing IT problems, it comes with its own set of complexities. When you have multiple Kubernetes clusters to look after, the complexity increases. Hence, monitoring these Kubernetes clusters efficiently is critical. If you neglect to implement proactive monitoring and leave your Kubernetes environment unmonitored, you will know about the problems/issues only after they have occurred.

What are the key metrics to monitor for Kubernetes?

There are a number of key areas and metrics within Kubernetes which our customers leverage eG Enterprise to proactively monitor. Metric thresholds, anomaly detection and alerting are configured for our users out-of-the-box, if using an alternative monitoring or observability tool we would advise implementing alerting on the following key metric categories:

Cluster CPU and Memory Utilization: This enables you to make decisions such as whether to add more CPU/Memory in the existing node or whether to adjust the requests and limits for pods.
CrashLoopBackOff Events: A CrashLoopBackOff status for a pod means that you have a pod starting, crashing, starting again, and then crashing again – this would make any dependent applications unstable.
Persistent Volume Failures: Kubernetes supports stateful workloads by using Persistent Volumes, databases and similar applications usually rely on Persistent Volumes being available.
Horizontal Pod AutoScaler (HPA) Issues: HPA allows users to scale up and down the number of pods based on target values (a “desired state”), for example (CPU Utilization). To set an appropriate target value, you need to monitor the performance of the application and understand the application behavior.
Node Conditions: The node health (Condition) is determined by several metrics like CPU, Memory, Network, Disk, Pod capacity etc.
Node Events: A container can fail to run due to many node issues, for example “Out of Memory”, monitoring node events will detect many issues.
Deployment Issues: Deployment in Kubernetes involves managing the lifecycle of pods including upgrades. When there are fewer available pods than desired pods for several minutes, a deployment would be considered as unhealthy.
Pods with Pending Status: If the pod is in “Pending” status, it cannot be scheduled into a node. It is important to monitor and diagnose the reasons for pending status.
DaemonSet Issues: A Kubernetes administrator should monitor that the number of available and running Daemonsets are same. If there is a mismatch, one or more Daemonset has failed.
Cluster Health Conditions: Monitor to identify whether any of the nodes in the cluster is experiencing Disk Pressure, Memory Pressure, PID Pressure (Node has too many processes/containers and unable to create newer processes) or Network Pressure.
Tracking Container Images: Containers can fail to run if the required images are not present in the repository. Containers can also misbehave an outdated image is pulled.
Kubernetes Control Plane Monitoring: The Kubernetes Control Plane has many essential services such as etc-d, scheduler, controller-manager and API server, etc., which must be monitored.
Pod Resource Monitoring: It is important to monitor CPU and Memory limits and usage for each container in the pod to avoid OOM (Out of Memory) and comparable errors.
Service Types: The Kubernetes Service construct helps to manage the load balancing configuration for pods, that enables pods to scale easily. It is best practice to monitor and audit the type of service assigned to the pod to reduce security risks.
Garbage Collection Monitoring: If GC behavior is errant or non-optimal the performance of nodes can be degraded.

Further details on monitoring these metrics are included in Kubernetes Monitoring & OpenShift Monitoring Metrics | eG (eginnovations.com).

How can AIOps improve Kubernetes monitoring?

An AIOps (Artificial Intelligence for IT Operations) monitoring platform such as eG Enterprise can significantly enhance and automate Kubernetes monitoring and alerting. EG Enterprise leverages machine learning and advanced statistical analytics to automate and optimize the monitoring process.

eG Enterprise automates the deployment of monitoring and removes the need for manual configuration ensuring that a K8s environment and applications deployed on it are continually monitored at scale without manual intervention, with features such as:

Auto-discovery built in ensures monitoring implemented in autoscaling use cases
Implements vendor-recommended best practices for infrastructure monitoring
Collects metrics and detect problems well before they become business impacting
Auto-baseline metrics for every component. Alerts when abnormal usage or behavior patterns are noticed
Differentiates causes of problems from effects, thereby facilitating faster problem resolution and lower MTTR
Built-in trend analysis and forecasting capabilities